The Robust Clustering with Reduction Dimension

نویسنده

  • Dyah E. Herwindiati
چکیده

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper Keywords—Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimension Reduction for Clustering Time Series Using Global Characteristics

Existing methods for time series clustering rely on the actual data values can become impractical since the methods do not easily handle dataset with high dimensionality, missing value, or different lengths. In this paper, a dimension reduction method is proposed that replaces the raw data with some global measures of time series characteristics. These measures are then clustered using a self-o...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Robust variable selection through MAVE

Dimension reduction and variable selection play important roles in high dimensional data analysis. The sparse MAVE, a model-free variable selection method, is a nice combination of shrinkage estimation, Lasso, and an effective dimension reduction method,MAVE (minimum average variance estimation). However, it is not robust to outliers in the dependent variable because of the use of least-squares...

متن کامل

An Investigation of the Effect of Freezing on Strength and Durability of Dimension Stones Using Fuzzy Clustering Technique and Statistical Analysis

Western and North-Western regions of Iran feature very cold winters, a lot of snow, and freezing temperatures during most nights in December, January, February, and March. This directly influences the selection and applications of dimension stones in these areas. Freezing influences both mechanical and physical properties of rocks. Therefore, measuring the changes in values of these parameters ...

متن کامل

Robust Discriminative Clustering with Sparse Regularizers

Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from “noise-looking” variables. We cast this problem as finding a low-dimensional projection of the data which is well-clustered. This yields a one-dimensional projection in the simplest situation with two clusters, and extends naturally to a multi-label scenario for mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012